Phrasal Translation for English-Chinese Cross Language Information Retrieval

نویسنده

  • Aitao Chen
چکیده

This paper introduces a simple and effective nonoverlapping unigram and bigram segmentation method for both monolingual Chinese and English-Chinese cross language retrieval. It also describes English-Chinese cross language retrieval experiments involving 54 topics and some 164,000 documents. The translation of English queries to Chinese is done using a Chinese-English dictionary of about 120,000 entries. A technique for extracting noun phrases is presented and applied prior to query translation. The phrasal translation outperformanced word translation by 23.6% even though most of the extracted noun phrases from the queries were not translated as phrase because of the limited coverage of the bilingual dictionary. The cross language retrieval achieved about 53% of the effectiveness of the monolingual retrieval, which suggests that there is lot of room for improvement. The two main limiting factors in English-Chinese retrieval performance are the limited coverage of the bilingual dictionary and the existence of multiple Chinese translation equivalents for many English words.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Supporting Multilingual Information Retrieval in Web Applications: An English-Chinese Web Portal Experiment

Cross-language information retrieval (CLIR) and multilingual information retrieval (MLIR) techniques have been widely studied, but they are not often applied to and evaluated for Web applications. In this paper, we present our research in developing and evaluating a multilingual English-Chinese Web portal in the business domain. A dictionary-based approach has been adopted that combines phrasal...

متن کامل

Experiments on Chinese-English Cross-language Retrieval at NTCIR-4

The AI Lab group participated in the crosslanguage retrieval task at NTCIR-4. Aiming at a practical retrieval system, our applied a dictionarybased approach incorporated with phrasal translation, co-occurrence disambiguation and query expansion techniques. Although experimental results were not as good as we expected, our study demonstrated the feasibility of applying CLIR techniques in real-wo...

متن کامل

Research on Lucene-based English-Chinese Cross-Language Information Retrieval

In this paper, we present our English-Chinese Cross-Language Information Retrieval (CLIR) system. We focus our attention on finding effective translation equivalents between English and Chinese, and improving the performance of Chinese IR. On English-Chinese CLIR, we adopt query translation as the dominant strategy, and utilize English-Chinese bilingual dictionary as the important knowledge res...

متن کامل

Exploiting the LDC Chinese-English Bilingual Wordlist for Cross Language Information Retrieval

We investigated using the LDC English/Chinese bilingual wordlists for English-Chinese cross language retrieval. It is shown that the Chinese-to-English wordlist can be considered as both a phrase and word dictionary, and is preferable to the English-to-Chinese version in terms of phrase translation and word translation selection. Additional techniques such as frequency-based term selection, tra...

متن کامل

English-Chinese Cross-Language Information Retrieval using Lucene Toolkit1

In this paper, we present our English-Chinese Cross-Language Information Retrieval (CLIR) system. We focus our attention on finding effective translation equivalents between English and Chinese, and improving the performance of Chinese IR. On English-Chinese CLIR, we adopt query translation as the dominant strategy, and utilize English-Chinese bilingual dictionary as the important knowledge res...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000